URL Metadata & OpenGraph Extractor
Pricing
$1.00 / 1,000 url reads
URL Metadata & OpenGraph Extractor
Reads a page's own public head tags, OpenGraph, Twitter card, title, description, canonical, favicon, and language, for clean link previews and RAG ingestion. Respects robots.txt by default. Billed only per URL successfully read.
Give it a list of page URLs and get back the metadata each page publishes for
previews: OpenGraph (og:*), Twitter card (twitter:*), <title>, meta
description, canonical link, declared favicon, and page language. Clean,
flat rows, built for link previews and for feeding RAG pipelines with consistent
per-link metadata.
Input
- URLs: one per line.
- Respect robots.txt: when on (default), the host's robots.txt is checked and disallowed URLs are skipped.
- Max delivered URLs: cap on billed rows (0 = no cap).
Output
One row per URL: url, finalUrl, httpStatus, title, description,
canonical, the og* fields, the twitter* fields, favicon, lang, plus
provenance (sourceUrl, retrievedAt, confidence, dataSource).
How it works
Sites publish these head tags specifically so other tools can render previews.
The actor fetches each page politely with a declared User-Agent, reads only the
head, and copies the tags verbatim. Relative og:image, canonical, and favicon
URLs are resolved to absolute against the page URL; nothing else is transformed,
and a tag the page does not declare is null, never invented. A URL that robots
disallows, or that fails to fetch, is written to the free rejected dataset and
is not billed. A site owner can ask us to skip their domain at
https://ponodata.com/opt-out ; opted-out hosts are skipped and never charged.
Billing
Pay per URL successfully read. Robots-disallowed and failed URLs cost nothing.
Sample output
A real run reading each page's own public head tags (one row per URL):
| URL | title | description | OG type |
|---|---|---|---|
| https://www.cloudflare.com | Cloudflare: Build for the… | Welcome to Cloudflare - Powering … | website |
| https://stripe.com | Stripe / Financial Infras… | Stripe is a financial services pl… | website |
| https://www.python.org | Welcome to Python.org | The official home of the Python P… | website |
| https://kubernetes.io | Kubernetes | Kubernetes, also known as K8s, is… | website |
Every row carries a sourceUrl (the page read), for example https://www.cloudflare.com. Pages that return no metadata route to the free reject dataset.
See also
More clean, pay-only-for-results data tools from Pono Data:
- Sitemap Extractor - every URL from any sitemap
- Bulk DNS Lookup - DNS records plus SPF, DMARC, and CAA
- Domain WHOIS via RDAP - registration data, structured from RDAP
Full catalog: https://apify.com/thoob